AITopics | test-time training

Collaborating Authors

test-time training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

In-Place Test-Time Training

Feng, Guhao, Luo, Shengjie, Hua, Kai, Zhang, Ge, He, Di, Huang, Wenhao, Cai, Tianle

arXiv.org Machine LearningApr-8-2026

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Machine Learning

2604.06169

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Test-TimeTrainingwithMaskedAutoencoders

Neural Information Processing SystemsFeb-11-2026, 16:00:39 GMT

In this paper, we use masked autoencoders for this one-sample learning problem.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

4267d84ca2f6fbb4aa5172b76b433aca-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 00:36:12 GMT

distribution shift, segmentation, video, (15 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

OST: ImprovingGeneralizationofDeepFake DetectionviaOne-ShotTest-TimeTraining

Neural Information Processing SystemsFeb-11-2026, 00:18:27 GMT

Such a weak generalization capability hinders the applicability of current deepfake detectors.

artificial intelligence, dfdcdfddf1, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.52)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

b618c3210e934362ac261db280128c22-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 20:37:25 GMT

adaptation, international conference, learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

RevisitingRealisticTest-TimeTraining: Sequential InferenceandAdaptationbyAnchoredClustering

Neural Information Processing SystemsFeb-9-2026, 17:25:21 GMT

Deploying models on target domain data subject to distribution shift requires adaptation.

artificial intelligence, machine learning, protocol, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Test-Time Training with Masked Autoencoders

Neural Information Processing SystemsDec-25-2025, 03:32:42 GMT

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.In this paper, we use masked autoencoders for this one-sample learning problem.Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts.Theoretically, we characterize this improvement in terms of the bias-variance trade-off.

masked autoencoder, name change, test-time training, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

Test-time Training for Matching-based Video Object Segmentation

Neural Information Processing SystemsDec-24-2025, 21:32:03 GMT

The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on matching to estimate segmentation masks of subsequent frames. Lacking any adaptation mechanism, such methods are prone to test-time distribution shifts. This work focuses on matching-based VOS under distribution shifts such as video corruptions, stylization, and sim-to-real transfer. We explore test-time training strategies that are agnostic to the specific task as well as strategies that are designed specifically for VOS.

matching-based video object segmentation, name change, test-time training, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback

Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

Hübotter, Jonas, Wolf, Patrik, Shevchenko, Alexander, Jüni, Dennis, Krause, Andreas, Kur, Gil

arXiv.org Artificial IntelligenceDec-12-2025

Many standard TTT methods train on carefully selected data from the pre-training dataset (i.e., do not add any new privileged information; Hardt & Sun, 2024; Hübotter et al., 2025), and several works studied how to optimally select data for imitation, e.g., the early seminal work of MacKay (1992) and recent extensions (Hübotter et al., 2024; Bagatella et al., 2025b). TTT has also been extended from supervised learning to reinforcement learning (Zuo et al., 2025; Bagatella et al., 2025a; Diaz-Bone et al., 2025). So far it has not been well understood why and when TTT is effective. While many different methods have been proposed for TTT, we focus here on analyzing "semi-parametric" TTT (e.g., Hardt & Sun, 2024; Hübotter et al., 2025), where a pre-trained model is fine-tuned with a supervised loss on a small neighborhood of the test point in the training data. This is different from some other methods for test-time "adaptation", which are commonly applied with distribution shifts (e.g., Wang et al., 2021; Zhang et al., 2022; Durasov et al., 2025). Basu et al. (2023) consider a similar setting to ours, but analyze it through the lens of non-parametric estimation, relying on the smoothness of the target function in the feature space Ψ.

artificial intelligence, machine learning, neighborhood, (16 more...)

arXiv.org Artificial Intelligence

2509.2451

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.93)

Technology: